NumPy Multi-Dimensional Computations
NumPy offers the capabilities to define multi-dimensional arrays, along with a large collection of mathematical operations and algorithms to work with these arrays. The core functionality is its data structure and methods for manipulation, which act as an efficient and flexible container to be utilized when analysing data sets. Various packages build on the functionality of NumPy for applications in specialized domains, such as Pandas and Statsmodels for data analysis, TensorFlow and PyTorch for deep learning, and OpenCV for image processing. These notes rely on the ideas and learnings from the respective package documentations, "Python For Data Analysis: Data Wrangling With Pandas, NumPy, And Jupyter", 3rd Edition, by Wes McKinney (creator and developer of Pandas) in 2022, and "Python Data Science Handbook: Essential Tools For Working With Data", 2nd Edition, by Jake VanderPlas in 2022.
When using Python for data analysis and data science, the most common packages and libraries include NumPy as the numeric library founding all of the calculations; Pandas as the cornerstone of data manipulation; Matplotlib, Seaborn, and Plotly for intricate visualizations; Statsmodels for advanced statistical functions; SciPy for advanced scientific computing; and Scikit-Learn for as a toolkit for machine learning; and TensorFlow and PyTorch for artificial intelligence applications. For convenience, Anaconda can be used for a distribution of Python with pre-installed packages which focus on data analysis and data science. In many cases, the vast and open network of packages and libraries available for Python can be leveraged depending on the requirements of projects. It should be kept in mind that the use of Python diverges from traditional tools used for data analysis which are primarily visual through point-and-click interfaces, such as Microsoft Excel and Tableau, and become difficult to use when processing very large sets of data.
Installation And Setup
NumPy (short for "Numerical Python") has been the fundamental package for scientific computing with Python. This is achieved through efficient multi-dimensional arrays, numerical computing tools and arithmetic with element-wise considerations, advanced mathematical operations and algorithms, flexibility, ease of use with high-level and low-level syntax, interoperable support for a wide range of hardware and computing platforms, performant optimization with the core code written in C and Fortran, and leverage of linear algebra packages (BLAS and LAPACK). NumPy also features an API allowing for other packages (and even native C or C++) to use it as a container and access its data structures and computational facilities (useful for wrapping legacy codebases). Through growth and open accessibility, NumPy forms an integral tool for quantum computing, statistical computing, signal processing, image processing, graphs and networks, astronomy processes, cognitive psychology, bioinformatics, mathematics, chemistry, geoscience, geographic processing, architecture, engineering, and machine learning.
The only prerequisite to install NumPy is Python. NumPy can usually be installed through a package manager, as conventionally performed using Pip, or, alternatively, through the native package manager of a Linux distribution (although this version may be outdated or may not be officially maintained). For advanced developers, NumPy can be built and installed from its source code with control over the options for compiling. Once installed, NumPy can be imported into a project.
pip install numpy
pip install --upgrade numpy
conda install numpy
conda update numpy
import numpy
import numpy as np
...
numpy.set_printoptions (precision = 4, suppress = True)
...
...
Array ...And Scalar... Creation
An array can be created in multiple dimensions and, conventionally, forms a scalar of 0 dimensions, vector with 1 dimension (as either a row or column (although technically there is no distinction)), matrix with 2 dimensions (as a collection of rows and columns), and tensor with 3 (or more) dimensions (as a collection of layers or pages, rows, and columns). Technically and semantically, dimensions are usually referred to as axes. The array is fundamentally a data structure of the library, but it should only contain data types which are homogeneous (such that every item takes up the same size block of memory and all of the blocks are interpreted in the same way based on thed data type), otherwise the mathematical operations performed may be extremely inefficient. In order to create an array, it is necessary to pass an object to the array, where this object is usually a number for a scalar, list for a vector, list of lists for a matrix, or list of lists of lists for a tensor with the dimensions respectively inferred as layers, rows, and columns. It is also possible to create an array with the automatic population of 0, 1, range of numbers, or uninitialized or arbitrary numbers. Considering the difference between views (same data buffer in memory) and copies (duplicated data buffer in memory), different arrays can also share the same data, so that changes to an array would be directly linked to the other array.
Scalar.
These arrays are able to be efficient and generic containers, as NumPy internally stores data in a contiguous block of memory which is independent of other built-in objects. The library of operations and algorithms can then work with this memory without any type checking or other overheads. This mechanism also allows for complex computations to be performed on the entire array without the need for loops. As mentioned, these operations and algorithms are written in C and Fortran which allows them to execute quickly and robustly (generally 10 to 100 or more times faster relative to regular Python and use significantly less memory). The relevant data types for arrays include int8
, uint8
, int16
, uint16
, int32
, uint32
, float16
(half precision), float32
(single precision), float64
(double precision), float128
(extended precision), complex64
, complex128
, complex256
, bool
, and object
.
numpy.array (object, dtype = None, *, copy = True, order = "K", subok = False, ndmin = 0, like = None) object = [D0, D1, D2, D3, D4, D5, D6, D7, D8, D9] object = [[R0C0, R0C1, R0C2, R0C3, R0C4, R0C5, R0C6, R0C7]] object = [[R0C0], [R1C0], [R2C0], [R3C0], [R4C0], [R5C0], [R6C0], [R7C0]] object = [[R0C0, R0C1, R0C2], [R1C0, R1C1, R1C2], [R2C0, R2C1, R2C2]] object = [[[R0C0L0, R0C1L0], [R1C0L0, R1C1L0]], [[R0C0L1, R0C1L1], [R1C0L1, R1C1L1]]]
numpy.zeros (shape, dtype = float, order = "C", *, like = None)
numpy.ones (shape, dtype = None, order = "C", *, like = None)
numpy.identity (count_rows_columns, dtype = None, *, like = None)
numpy.full (shape, fill_value, dtype = None, order = "C", *, like = None)
numpy.empty (shape, dtype = float, order = "C", *, like = None)
numpy.arange ([start, ] stop, [step, ] dtype = None, *, like = None)
numpy.linspace (start, stop, num = 50, endpoint = True, retstep = False, dtype = None, axis = 0)
numpy.meshgrid (x, y, z, copy = True, sparse = False, indexing = "xy")
The information and properties intrinsic to an array are reflected by the attributes of the array. Some common attributes include the rank as the number of dimensions, shape as the size along each dimension (layers, rows, and columns), and size as the total number of elements. ...
An array can be indexed to create a slice by a list or tuple of integers (positional indexing), booleans (logical indexing), or another array for advanced indexing. Instead of applying indices recursively (indexing into layers, then indexing into rows, and then indexing into columns), it is possible to directly specify the indices in groups for layers, rows, and columns (although it can be helpful to think directly in terms of dimensions rather than layers, rows, and columns). This can also be performed with booleans for the indexes, which can be useful when combined with logic functions to filter for specific elements. A useful approach is to use the newaxis
object (or None
) with indexing to expand the dimensions of the resulting selection by a unit-length dimension. It should be noted that and
and or
do not work with boolean arrays.
A distinction needs to be made between basic indexing and advanced indexing. The primary difference between basic indexing and advanced indexing is that basic indexing will only select a slice from an array, while advanced indexing will select an arbitrary group from an array (allows for repetition of indices). Under basic indexing, a slice of the original array is referenced, where this slice is a view (use the same values in memory) and any modification to the view will be reflected in the original array (need to explicitly specify a copy to create a new object). Under advanced indexing, a group from the original array is created, where this group is a copy and ...acts as... a new object. It should be noted that selecting data by boolean indexing and assigning the result will always create a copy of the data. In addition, the search order for indexing is row-major (fill the consecutive elements of a row before moving to subsequent rows).
Basic Array Manipulation
In most cases, there are equivalent general functions and methods which can be used for the manipulation of an array. The shape of an array can be change, as long as the new shape is compatible with the original shape (consistent number of elements). Following on, transposing is a special form of reshaping, where the axes are flipped or swapped based on the permuted order. Additional values can be inserted into or appended onto an array, while other values can be deleted using the appropriate indices. Multiple arrays can also be concatenated or joined along an existing axis, where the arrays must have the same shape except in the dimension corresponding to the axis. Likewise, it is possible to stack arrays along a new axis, where the arrays must have the same shape (rebuilds arrays which have been split). Conversely, an array can be split or divided into a list of multiple sub-arrays in sections of equal size or at specific indices, horizontal column-wise sub-arrays, vertical row-wise sub-arrays, or depth layer-wise sub-arrays.
numpy.reshape (array, shape, order = "C")
numpy.transpose (array, axes = None)
numpy.swapaxes (array, axis_0, axis_1)
numpy.flip (array, axis = None)
numpy.fliplr (array)
numpy.flipud (array)
numpy.insert (array, indices, values, axis = None)
numpy.append (array, values, axis = None)
numpy.delete (array, indices, axis = None)
numpy.concatenate ((array_1, array_2, array_3), axis = 0, out = None, dtype = None, casting = "same_kind")
numpy.stack ((array_1, array_2, array_3), axis = 0, out = None, *, dtype = None, casting = "same_kind")
numpy.hstack ((array_1, array_2, array_3), *, dtype = None, casting = "same_kind")
numpy.vstack ((array_1, array_2, array_3), *, dtype = None, casting = "same_kind")
numpy.dstack ((array_1, array_2, array_3))
numpy.split (array, sections_indices, axis = 0)
numpy.hsplit (array, sections_indices)
numpy.vsplit (array, sections_indices)
numpy.dsplit (array, sections_indices)
Calculations And Operations
An arithmetic operation can be performed in an element-wise or matrix-wise manner. For element-wise operations, these can be thought of as batch operations, where the operation is applied to each element of the array (often referred to as vectorization). For matrix-wise operation, these usually involve linear algebra, such as matrix multiplication, conjugation, inversion, decompositions, determinants, or eigenvalues. Similarly, logic functions can be used to evaluate an array with the results given as a boolean in an element-wise or matrix-wise manner - common logic functions include evaluation whether arrays are greater than, less than, or equal to variants. If the arrays are not the same shape, they must be broadcastable to a common shape along the dimensions. It should be noted that, due to this broadcasting and whenever an operation involves an array with a scalar, an element-wise operation will be performed, where the scalar is applied to each element of the array based on the operation.
numpy.add (array_augend, array_addend, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.subtract (array_minuend, array_subtrahend, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.multiply (array_multiplicand, array_multiplier, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.divide (array_dividend, array_divisor, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.mod (array_dividend, array_divisor, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.power (array_base, array_exponent, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.matmul (array_0, array_1, /, out = None, *, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj, axes, axis])
numpy.dot (array_0, array_1, out = None)
numpy.linalg.inv (array)
numpy.linalg.det (array)
numpy.linalg.eigvals (array)
numpy.linalg.solve (array_a, array_b)
~
):numpy.greater (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.greater_equal (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.less (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.less_equal (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.equal (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.not_equal (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
~
):numpy.all (array, axis = None, out = None, keepdims = <no value>, *, where = <no value>)
numpy.any (array, axis = None, out = None, keepdims = <no value>, *, where = <no value>)
numpy.array_equal (array_0, array_1, equal_nan = False)
~
):numpy.logical_not (array, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.logical_and (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.logical_or (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
numpy.logical_xor (array_0, array_1, /, out = None, *, where = True, casting = "same_kind", order = "K", dtype = None, subok = True [, signature, extobj])
It should be noted that these functions which perform element-wise operations on data in an array are known as universal functions. These can be thought of as vectorized wrappers for simple functions working in an element-by-element fashion, supporting broadcasting, and supporting type casting (unary universal functions act with 1 array, while binary universal functions act with 2 arrays). The direct advantage of universal functions is the ability to replace explicit loops with simple array expressions which are faster and more efficient (vectorization). In combination with ...regular... functions, many calculations can be then performed for statistics (mean, median, standard deviation, etc), sorting (numerical, alphabetical, ascending, descending, etc), and sets (unique, ..., etc).
...
...
...
...
...
...
...Constants...
There are several constants which represent ... . These include infinity as inf
(with aliases of Inf
, Infinity
, PINF
, and infty
), positive infinity as PINF
, negative infinity as NINF
, not a number as nan
(with aliases of NaN
and NAN
), positive zero as PZERO
, negative zero as NZERO
, Euler's number as e
, Euler's constant as euler_gamma
, and pi as numpy.pi
. ...
Structured Array
A structured array is a data type which contains ..., where each sub-type is a field which has a name, type, and optional title.
https://numpy.org/doc/stable/user/basics.rec.htmlPseudo-Random Number Generation
In supplement to the built-in module, the pseudo-random number generator allows for the creation of random samples of values from different probability distributions. Some common examples of these distributions include uniform, normal, beta, chi-square, and gamma. For the mechanisms, Generator
includes algorithmic improvements and serves as a replacement for RandomState
(legacy without further development). There is also functionality for performing permutations or sampling from an array. It should be noted that the results are deterministically reproducible based on the seed used for the initial state (although there is no version compatibility guarantee).
Generator
) with a modified seed and configuration:numpy.random.default_rng (seed = 12345)
numpy.random.Generator.random (size = None, dtype = numpy.float64, out = None)
numpy.random.Generator.uniform (low = 0.0, high = 1.0, size = None)
numpy.random.Generator.standard_normal (size = None, dtype = numpy.float64, out = None)
numpy.random.Generator.normal (mean = 0.0, standard_deviation = 1.0, size = None)
numpy.random.Generator.integers (low, high = None, size = None, dtype = numpy.int64, endpoint = False)
numpy.random.Generator.binomial (parameter_n, parameter_p, size = None)
numpy.random.Generator.beta (alpha, beta, size = None)
numpy.random.Generator.chisquare (degrees_freedom, size = None)
numpy.random.Generator.gamma (shape, scale = 1.0, size = None)
numpy.random.Generator.shuffle (array, axis = 0)
numpy.random.Generator.permuted (array, axis = None, out = None)
numpy.random.Generator.permutation (array, axis = 0)
numpy.random.Generator.choice (array, size = None, replace = True, probabilities = None, axis = 0, shuffle = True)
Saving And Loading
It is possible to save and load data in binary or text formats. The default format is an uncompressed raw binary file for a single array with npy
as the extension. It is also possible to save the data as an uncompressed or compressed zipped archive of multiple arrays with npz
as the extension. Alternatively, the data can be saved in a common text format, where it is necessary to specify the delimiter, new line character, header, footer, comments, and encoding.
numpy.save (file_name, array, allow_pickle = True, fix_imports = True)
numpy.savez (file_name, arr_0 = array_0, arr_1 = array_1, *args, **kwds)
numpy.savez_compressed (file_name, arr_0 = array_0, arr_1 = array_1, *args, **kwds)
numpy.load (file_name, mmap_mode = None, allow_pickle = False, fix_imports = True, encoding = "ASCII", *, max_header_size = 10000)
txt
, csv
, tsv
, or other delimited files:numpy.savetxt (file_name, X, fmt = "%.18e", delimiter = " ", newline = "\n", header = "", footer = "", comments = "# "", encoding = None)
txt
, csv
, tsv
, or other delimited files:numpy.loadtxt (file_name, dtype = <class "float">, comments = "#"", delimiter = None, converters = None, skiprows = 0, usecols = None, unpack = False, ndmin = 0, encoding = "bytes", max_rows = None, *, quotechar = None, like = None)